op {
  graph_op_name: "CrossReplicaSum"
  visibility: HIDDEN
  in_arg {
    name: "input"
    description: <<END
The local input to the sum.
END
  }
  in_arg {
    name: "group_assignment"
    description: <<END
An int32 tensor with shape
[num_groups, num_replicas_per_group]. `group_assignment[i]` represents the
replica ids in the ith subgroup.
END
  }
  out_arg {
    name: "output"
    description: <<END
The sum of all the distributed inputs.
END
  }
  attr {
    name: "T"
    description: <<END
The type of elements to be summed.
END
  }
  summary: "An Op to sum inputs across replicated TPU instances."
  description: <<END
Each instance supplies its own input.

For example, suppose there are 8 TPU instances: `[A, B, C, D, E, F, G, H]`.
Passing group_assignment=`[[0,2,4,6],[1,3,5,7]]` sets `A, C, E, G` as group 0,
and `B, D, F, H` as group 1. Thus we get the outputs:
`[A+C+E+G, B+D+F+H, A+C+E+G, B+D+F+H, A+C+E+G, B+D+F+H, A+C+E+G, B+D+F+H]`.
END
}
